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Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings, of claims in the 
application. No claims are amended herein. 

Listing of Claims: 

What is claimed is: 

1. (original) An apparatus comprising: 

a first processor to execute a main thread instruction stream that includes a delinquent 
instruction; 

a second processor to execute a helper thread instruction stream that includes a subset of 
the main thread instruction stream, wherein the subset includes the delinquent instruction; 

wherein said first and second processors each include a private data cache; 

a shared memory system coupled to said first processor and to said second processor; and 

logic to retrieve, responsive to a miss of requested data for the delinquent instruction in 
the private cache of the second processor, the requested data from the shared memory 
system; 

the logic further to provide the requested data to the private data cache of the first 
processor. 

2. (original) The apparatus of claim 1, wherein: 

the first processor, second processor and logic are included within a chip package. 
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3. (original) The apparatus of claim 1, wherein: 
the shared memory system includes a shared cache. 

4. (original) The apparatus of claim 3, wherein: 

the shared memory system includes a second shared cache. 

5. (original) The apparatus of claim 3, wherein: 
the shared cache is included within a chip package. 

6. (original) The apparatus of claim 1, wherein: 

the logic is further to provide the requested data from the shared memory system to the 
private data cache of the second processor. 

7. (original) The apparatus of claim 1, wherein: 

said first and second processors are included in a plurality of n processors, where n > 2; 
each of said plurality of processors is coupled to the shared memory system; and 
each of said n plurality of processors includes a private data cache. 

8. (original) The apparatus of claim 7, wherein: 

the logic is further to provide the requested data from the shared memory system to each 
of the n private data caches. 

9. (original) The apparatus of claim 7, wherein: 

the logic is further to provide the requested data from the shared memory system to a 
subset of the n private data caches, the subset including x of the n private data caches, where 
0 < x < n. 
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10. (original) The apparatus of claim 1, wherein: 

the first processor is further to trigger the second processor's execution of the helper 
thread instruction stream responsive to a trigger instruction in the main thread instruction 
stream. 

11. (original) An apparatus comprising: 

a first processor to execute a main thread instruction stream that includes a delinquent 
instruction; 

a second processor to execute a helper thread instruction stream that includes a subset of 
the main thread instruction stream, wherein the subset includes the delinquent instruction; 

wherein said first and second processors each include a private data cache; and 

logic to retrieve, responsive to a miss of requested data for the delinquent instruction in a 
first one of the private data caches, the requested data from the other private data cache if 
said requested data is available in the other private data cache; 

the logic further to provide the requested data to the first private data cache. 

12. (original) The apparatus of claim 11, further comprising: 

a shared memory system coupled to said first processor and to said second processor; 

wherein said logic is further to retrieve the requested data from the shared memory 
system if the requested data is not available in the other private data cache. 

13. (original) The apparatus of claim 11, wherein: 
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the logic is included within an interconnect, wherein the interconnect is to provide 
networking logic for communication among the first processor, the second processor, and the 
shared memory system. 

14. (original) The apparatus of claim 13, wherein: 

the first and second processor are each included in a plurality of n processors; and 

the interconnect is further to concurrently broadcast a request for the requested data to 
each of the n processors and to the shared memory system. 



15. (original) The apparatus of claim 11, wherein: 
the memory system includes a shared cache. 

16. (original) The apparatus of claim 15, wherein: 
the memory system includes a second shared cache. 

17. (original) The apparatus of claim 11, wherein: 



the first processor is further to trigger the second processor's execution of the helper 
thread instruction stream responsive to a trigger instruction in the main thread instruction 
stream 

18. (original) A method comprising: 

determining that a helper core has suffered a miss in a private cache for a load instruction 
while executing a helper thread; and 

prefetching load data for the load instruction into a private cache of a main core. 

19. (original) The method of claim 18, wherein prefetching further comprises: 
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retrieving the load data from a shared memory system; and 
providing the load data to the private cache of the main core. 

20. (original) The method of claim 18, further comprising: 

providing load data for the load instruction from a shared memory system into the private 
cache of the helper core. 

21. (original) The method of claim 1 8, further comprising: 

providing load data for the load instruction from a shared memory system into the private 
cache for each of a plurality of helper cores. 

22. (original) The method of claim 18, wherein prefetching further comprises: 
retrieving the load data from a private cache of a helper core; and 

providing the load data to the private cache of the main core. 

23. (original) The method of claim 18, wherein prefetching further comprises: 
concurrently : 

broadcasting a request for the load data to each of a plurality of cores; and 
requesting the load data from a shared memory system. 

24. (original) The method of claim 23, wherein prefetching further comprises: 
providing, if the load data is available in a private cache of one of the plurality of 

cores, the load data to the main core from the private cache of one of the plurality of 
cores ; and 
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providing, if the load data is not available in a private cache of one of the plurality of 
cores, the load data to the main core from the shared memory system. 

25. (original) An article comprising: 

a machine-readable storage medium having a plurality of machine accessible instructions, 
which if executed by a machine, cause the machine to perform operations comprising: 
determining that a helper core has suffered a miss in a private cache for a load 
instruction while executing a helper thread; and 

prefetching load data for the load instruction into a private cache of a main core. 

26. (original) The article of claim 25, wherein: 

the instructions that cause the machine to prefetch load data further comprise instructions 
that cause the machine to : 

retrieve the load data from a shared memory system; and 

provide the load data to the private cache of the main core. 

27. (original) The article of claim 25, further comprising: 

a plurality of machine accessible instructions, which if executed by a machine, cause the 
machine to perform operations comprising: 

providing load data for the load instruction from a shared memory system into the 
private cache of the helper core. 

28. (original) The article of claim 25, further comprising: 

a plurality of machine accessible instructions, which if executed by a machine, cause the 
machine to perform operations comprising: 
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providing load data for the load instruction from a shared memory syste5 into the 
private cache for each of a plurality of helper cores. 

29. (previously amended) The article of claim [[24]] 25, wherein: 

the instructions that cause the machine to prefetch load data further comprise instructions 
that cause the machine to : 

retrieve the load data from a private cache of a helper core; and 

provide the load data to the private cache of the main core. 

30. (previously amended) The article of claim [[24]] 25, wherein: 

the instructions that cause the machine to prefetch load data further comprise instructions 
that cause the machine to : 

concurrently: 

broadcast a request for the load data to each of a plurality of cores; and 
request the load data from a shared memory system. 

31. (original) The article of claim 25, wherein: 

the instructions that cause the machine to prefetch load data further comprise instructions 
that cause the machine to: 

provide, if the load data is available in a private cache of one of the plurality of cores, 
the load data to the main core from the private cache of one of the plurality of cores ; and 

provide, if the load data is not available in a private cache of one of the plurality of 
cores, the load data to the main core from the shared memory system. 
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32. (original) A system comprising: 

a memory system that includes a dynamic random access memory; 

a first processor, coupled to the memory system, to execute a first instruction stream; 

a second processor, coupled to the memory system, to concurrently execute a second 
instruction stream; and 

helper threading logic to provide fill data prefetched by the second processor to the first 
processor. 

33. (original) The system of claim 32, wherein: 

the helper threading logic is further to push the fill data to the first processor before the 
fill data is requested by an instruction of the first instruction stream. 

34. (original) The system of claim 32, wherein: 

the helper threading logic is further to provide the fill data to the first processor from a 
private cache of the second processor. 

35. (original) The system of claim 32, wherein: 

the helper threading logic is further to provide the fill data to the first processor from the 
memory system. 

36. (original) The system of claim 32, further comprising: 

an interconnect that manages communication between the first and second processors. 

37. (original) The system of claim 32, wherein: 

the memory system includes a cache that is shared by the first and second processors. 
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